3D autonomous driving semantic segmentation using deep learning has become, a well-studied subject, providing methods that can reach very high performance. Nonetheless, because of the limited size of the training datasets, these models cannot see every type of object and scenes found in real-world applications. The ability to be reliable in these various unknown environments is called domain generalization. Despite its importance, domain generalization is relatively unexplored in the case of 3D autonomous driving semantic segmentation. To fill this gap, this paper presents the first benchmark for this application by testing state-of-the-art methods and discussing the difficulty of tackling LiDAR domain shifts. We also propose the first method designed to address this domain generalization, which we call 3DLabelProp. This method relies on leveraging the geometry and sequentiality of the LiDAR data to enhance its generalization performances by working on partially accumulated point clouds. It reaches a mIoU of 52.6% on SemanticPOSS while being trained only on SemanticKITTI, making it state-of-the-art method for generalization (+7.4% better than the second best method). The code for this method will be available on Github.
translated by 谷歌翻译
场景完成是从场景的部分扫描中完成缺失几何形状的任务。大多数以前的方法使用3D网格上的截断签名距离函数(T-SDF)计算出隐式表示,作为神经网络的输入。截断限制,但不会删除由非关闭表面符号引入的模棱两可的案例。作为替代方案,我们提出了一个未签名的距离函数(UDF),称为未签名的加权欧几里得距离(UWED)作为场景完成神经网络的输入表示。 UWED作为几何表示是简单而有效的,并且可以在任何点云上计算,而与通常的签名距离函数(SDF)相比,UWED不需要正常的计算。为了获得明确的几何形状,我们提出了一种从常规网格上离散的UDF值提取点云的方法。我们比较了从RGB-D和LIDAR传感器收集的室内和室外点云上的场景完成任务的不同SDF和UDFS,并使用建议的UWED功能显示了改进的完成。
translated by 谷歌翻译
LIDAR传感器提供有关周围场景的丰富3D信息,并且对于自动驾驶汽车的任务(例如语义细分,对象检测和跟踪)变得越来越重要。模拟激光雷达传感器的能力将加速自动驾驶汽车的测试,验证和部署,同时降低成本并消除现实情况下的测试风险。为了解决以高保真度模拟激光雷达数据的问题,我们提出了一条管道,该管道利用移动映射系统获得的现实世界点云。基于点的几何表示,更具体地说,已经证明了它们能够在非常大点云中准确对基础表面进行建模的能力。我们引入了一种自适应夹层生成方法,该方法可以准确地对基础3D几何形状进行建模,尤其是对于薄结构。我们还通过在GPU上铸造Ray铸造的同时,在有效处理大点云的同时,我们还开发了更快的时间激光雷达模拟。我们在现实世界中测试了激光雷达的模拟,与基本的碎片和网格划分技术相比,表现出定性和定量结果,证明了我们的建模技术的优势。
translated by 谷歌翻译
转移学习是2D计算机愿景中的一种经过验证的技术,可以利用可用的大量数据并获得高性能,而数据集则由于获取或注释的成本而受到限制。在3D中,注释是一项昂贵的任务。然而,直到最近才研究转移学习方法。由于没有非常大的注释数据集,因此无监督的预培训受到了极大的青睐。在这项工作中,我们解决了稀疏室外激光扫描的实时3D语义细分的案例。这样的数据集已经上升,但是对于同一任务,也有不同的标签集。在这项工作中,我们在这里提出了一个名为“粗标签”的中级标签集,该标签允许在没有任何手动标签的情况下利用所有可用数据。这样,我们可以访问较大的数据集,以及更简单的语义分割任务。有了它,我们引入了一项新的预训练任务:粗制标签预训练,也称为可乐。我们彻底分析了可乐对各种数据集和体系结构的影响,并表明它可以提高性能,尤其是当填充任务仅访问小型数据集时。
translated by 谷歌翻译
Paris-Carla-3d是由移动激光器和相机系统构建的几个浓彩色点云的数据集。数据由两组具有来自开源Carla模拟器(700百万分)的合成数据和在巴黎市中获取的真实数据(6000万分),因此Paris-Carla-3d的名称。此数据集的一个优点是在开源Carla模拟器中模拟了相同的LIDAR和相机平台,因为用于生产真实数据的开源Carla Simulator。此外,使用Carla的语义标记的手动注释在真实数据上执行,允许将转移方法从合成到实际数据进行测试。该数据集的目的是提供一个具有挑战性的数据集,以评估和改进户外环境3D映射的困难视觉任务的方法:语义分段,实例分段和场景完成。对于每项任务,我们描述了评估协议以及建立基线的实验。
translated by 谷歌翻译
We present Kernel Point Convolution 1 (KPConv), a new design of point convolution, i.e. that operates on point clouds without any intermediate representation. The convolution weights of KPConv are located in Euclidean space by kernel points, and applied to the input points close to them. Its capacity to use any number of kernel points gives KP-Conv more flexibility than fixed grid convolutions. Furthermore, these locations are continuous in space and can be learned by the network. Therefore, KPConv can be extended to deformable convolutions that learn to adapt kernel points to local geometry. Thanks to a regular subsampling strategy, KPConv is also efficient and robust to varying densities. Whether they use deformable KPConv for complex tasks, or rigid KPconv for simpler tasks, our networks outperform state-of-the-art classification and segmentation approaches on several datasets. We also offer ablation studies and visualizations to provide understanding of what has been learned by KPConv and to validate the descriptive power of deformable KPConv.
translated by 谷歌翻译